Introduction to Computer Vision: Plant Seedlings Classification¶

Problem Statement¶

Context¶

In recent times, the field of agriculture has been in urgent need of modernizing, since the amount of manual work people need to put in to check if plants are growing correctly is still highly extensive. Despite several advances in agricultural technology, people working in the agricultural industry still need to have the ability to sort and recognize different plants and weeds, which takes a lot of time and effort in the long term. The potential is ripe for this trillion-dollar industry to be greatly impacted by technological innovations that cut down on the requirement for manual labor, and this is where Artificial Intelligence can actually benefit the workers in this field, as the time and energy required to identify plant seedlings will be greatly shortened by the use of AI and Deep Learning. The ability to do so far more efficiently and even more effectively than experienced manual labor, could lead to better crop yields, the freeing up of human inolvement for higher-order agricultural decision making, and in the long term will result in more sustainable environmental practices in agriculture as well.

Objective¶

The aim of this project is to Build a Convolutional Neural Network (CNN) to classify plant seedlings into their respective categories.

Data Dictionary¶

The Aarhus University Signal Processing group, in collaboration with the University of Southern Denmark, has recently released a dataset containing images of unique plants belonging to 12 different species.

  • The dataset can be download from Olympus.
  • The data file names are:
    • images.npy
    • Labels.csv
  • Due to the large volume of data, the images were converted to the images.npy file and the labels are also put into Labels.csv, so that you can work on the data/project seamlessly without having to worry about the high data volume.

  • The goal of the project is to create a classifier capable of determining a plant's species from an image.

List of Species (12)

  • Black-grass
  • Charlock
  • Cleavers
  • Common Chickweed
  • Common Wheat
  • Fat Hen
  • Loose Silky-bent
  • Maize
  • Scentless Mayweed
  • Shepherds Purse
  • Small-flowered Cranesbill
  • Sugar beet

Note: Please use GPU runtime on Google Colab to execute the code faster.¶

Importing necessary libraries¶

In [ ]:
# Installing the libraries with the specified version.
# uncomment and run the following line if Google Colab is being used
# !pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==1.5.3 opencv-python==4.8.0.76 -q --user
In [ ]:
# Installing the libraries with the specified version.
# uncomment and run the following lines if Jupyter Notebook is being used
#!pip install tensorflow==2.13.0 scikit-learn==1.2.2 seaborn==0.11.1 matplotlib==3.3.4 numpy==1.24.3 pandas==1.5.2 opencv-python==4.8.0.76 -q --user
In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import math

from PIL import Image
import cv2

from sklearn.model_selection import train_test_split  # Function for splitting datasets for training and testing.
from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import confusion_matrix, classification_report

import tensorflow as tf

# Keras Sequential Model
from tensorflow.keras.models import Sequential

# Importing all the different layers and optimizers
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, Activation, LeakyReLU, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam,SGD
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.models import Model

# For Transfer Learning
from tensorflow.keras.applications import InceptionV3

# For Data Augmentation
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# The below code can be used to ignore the warnings that may occur due to deprecations
import warnings
warnings.filterwarnings("ignore")

Loading the dataset¶

In [3]:
# Uncomment and run the below code if you are using google colab
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [4]:
# Load the Labels CSV file into a NumPy array
labels = np.loadtxt('/content/drive/MyDrive/Personal/UT Austin/Project 5 - CV/Labels.csv', delimiter=',', dtype=str, skiprows=1)
In [5]:
# Load the images.npy file
images = np.load('/content/drive/MyDrive/Personal/UT Austin/Project 5 - CV/images.npy')

Data Overview¶

Understand the shape of the dataset¶

In [ ]:
images.shape
Out[ ]:
(4750, 128, 128, 3)

We have 4,570 images of 128x128 pixels and in color (3 channels)

In [ ]:
labels.shape
Out[ ]:
(4750,)
In [ ]:
# Get the count of unique values
num_unique_labels = len(np.unique(labels))

print(f"Number of unique species: {num_unique_labels}")
Number of unique species: 12
In [ ]:
# Get unique values and their counts
unique_labels, counts = np.unique(labels, return_counts=True)

# Display the unique values with their counts
for label, count in zip(unique_labels, counts):
    print(f"{label}: {count}")
Black-grass: 263
Charlock: 390
Cleavers: 287
Common Chickweed: 611
Common wheat: 221
Fat Hen: 475
Loose Silky-bent: 654
Maize: 221
Scentless Mayweed: 516
Shepherds Purse: 231
Small-flowered Cranesbill: 496
Sugar beet: 385

And here we have the corresponding labels for the 4,750 images, the different label values and counts per value. As we can see, there is some imbalance in the number of images per label value.

Exploratory Data Analysis¶

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you understand the data better.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.
  1. How are these different category plant images different from each other?
  2. Is the dataset provided an imbalance? (Check with using bar plots)
In [5]:
def display_random_images(image_array, labels, rows=4, cols=6):
    """
    Display a grid of random images with their corresponding labels.

    Parameters:
    - image_array: The array containing images (shape: (num_images, height, width, channels)).
    - labels: Array of labels corresponding to the images.
    - rows: Number of rows in the plot grid (default is 4).
    - cols: Number of columns in the plot grid (default is 6).
    """
    # Define the number of unique classes (optional, you can use it elsewhere if needed)
    num_classes = len(np.unique(labels))

    # Create a figure with the specified size
    fig = plt.figure(figsize=(15, 12))

    # Loop through each subplot and display random images
    for i in range(cols):
        for j in range(rows):
            # Generate a random index to select a random image and label
            random_index = np.random.randint(0, len(labels))

            # Add a subplot to the grid
            ax = fig.add_subplot(rows, cols, i * rows + j + 1)

            # Plot the image
            ax.imshow(image_array[random_index])

            # Set the title to the corresponding label
            ax.set_title(labels[random_index])

            # Remove axis ticks for better visualization
            ax.axis('off')

    # Display the plot
    plt.show()
In [ ]:
display_random_images(images, labels, rows=4, cols=6)

The images as seen above are in BGR format. The leaves still look green, because even in RGB format, the green channel is in the same position. However, the rocks appear blue which is rather unnatural which is the main clue that the images are not in the expected RGB format expected by the imshow function from Matplotlib.

In [6]:
# Function to create labeled bar plots from a 1D array
def labeled_barplot(data, perc=False, n=None):
    """
    Barplot with count or percentage labels at the top.
    data: 1D array or list with categorical data.
    perc: whether to display percentages instead of counts (default is False).
    n: displays the top n categories (default is None, i.e., display all).
    """
    # Get counts of unique values
    counts = Counter(data)

    # Optionally, select the top n categories
    if n is not None:
        counts = dict(counts.most_common(n))

    categories = list(counts.keys())
    values = list(counts.values())
    total = sum(values)  # Total count of all elements

    # Plot setup
    plt.figure(figsize=(len(categories) + 2, 6))
    ax = sns.barplot(x=categories, y=values, palette="Paired")
    plt.xticks(rotation=90, fontsize=15)

    # Annotate bars with counts or percentages
    for p in ax.patches:
        if perc:
            label = "{:.1f}%".format(100 * p.get_height() / total)  # Percentage
        else:
            label = int(p.get_height())  # Count
        x = p.get_x() + p.get_width() / 2
        y = p.get_height()
        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points"
        )

    plt.show()
In [ ]:
labeled_barplot(labels, perc=False, n=None)

There is some mild imbalance in the number of samples for each plant species.

Data Pre-Processing¶

Convert the BGR images to RGB images.¶

In [6]:
# Initialize an empty array to store the RGB images
rgb_images = np.zeros_like(images)

# Loop through and convert each image from BGR to RGB
for i in range(images.shape[0]):
    rgb_images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB)
In [ ]:
display_random_images(rgb_images, labels, rows=4, cols=6)

After converting the images from BGR to RGB, the rocks in the image now look much more natural.

Resize the images¶

As the size of the images is large, it may be computationally expensive to train on these larger images; therefore, it is preferable to reduce the image size from 128 to 64.

In [7]:
resized_images = np.zeros((rgb_images.shape[0], 64, 64, 3), dtype=images.dtype)  # Prepare an empty array

for i in range(images.shape[0]):
    img = Image.fromarray(images[i].astype('uint8'))  # Convert NumPy array to PIL image
    img_resized = img.resize((64, 64))  # Resize the image to 64x64
    resized_images[i] = np.array(img_resized)  # Convert back to NumPy array

# Loop through and convert each image from BGR to RGB
for i in range(resized_images.shape[0]):
    resized_images[i] = cv2.cvtColor(resized_images[i], cv2.COLOR_BGR2RGB)

print(f"New shape of images: {resized_images.shape}")
New shape of images: (4750, 64, 64, 3)
In [73]:
display_random_images(resized_images, labels, rows=4, cols=6)

Data Preparation for Modeling¶

  • Before you proceed to build a model, you need to split the data into train, test, and validation to be able to evaluate the model that you build on the train data
  • You'll have to encode categorical features and scale the pixel values.
  • You will build a model using the train data and then check its performance

Split the dataset

In [67]:
X = resized_images
y = labels

# Step 1: Split into 80% train and 20% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Resulting splits:
# X_train, y_train -> 80% of the data (training set)
# X_test, y_test -> 20% of the data (test set)

# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3800, 64, 64, 3) (3800,)
Test set shape: (950, 64, 64, 3) (950,)

Encode the target labels¶

In [68]:
# Creating one-hot encoded representation of target labels
# we can do this by using this utility function - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical

# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels)  # Fit on the full labels array to capture all unique values

# 2. Encode y_train and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)

# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (3800, 12)
One-hot encoded y_test shape: (950, 12)

Data Normalization¶

As we know image pixel values range from 0-255, here we are simply dividing all the pixel values by 255 to standardize all the images to have their pixel values between 0-1.

In [69]:
# Normalizing the image pixels
X_train = X_train/255
X_test = X_test/255
In [70]:
#Verify the values are now in the 0 to 1 range
X_train[0]
Out[70]:
array([[[0.40784314, 0.35686275, 0.32156863],
        [0.39215686, 0.34901961, 0.30196078],
        [0.38039216, 0.34901961, 0.29803922],
        ...,
        [0.28627451, 0.21568627, 0.17254902],
        [0.27843137, 0.21176471, 0.15686275],
        [0.30980392, 0.24313725, 0.17647059]],

       [[0.4       , 0.35686275, 0.30980392],
        [0.4       , 0.35294118, 0.30196078],
        [0.38431373, 0.35294118, 0.30196078],
        ...,
        [0.30980392, 0.24705882, 0.2       ],
        [0.30588235, 0.24705882, 0.19215686],
        [0.30588235, 0.24313725, 0.18431373]],

       [[0.40784314, 0.36862745, 0.31372549],
        [0.40784314, 0.36862745, 0.30980392],
        [0.4       , 0.36862745, 0.31764706],
        ...,
        [0.30980392, 0.24705882, 0.20392157],
        [0.30588235, 0.24705882, 0.2       ],
        [0.31372549, 0.25490196, 0.2       ]],

       ...,

       [[0.42352941, 0.38431373, 0.34509804],
        [0.40392157, 0.37647059, 0.32941176],
        [0.34117647, 0.30588235, 0.25490196],
        ...,
        [0.40392157, 0.3372549 , 0.27058824],
        [0.40392157, 0.3372549 , 0.26666667],
        [0.40392157, 0.33333333, 0.2627451 ]],

       [[0.41568627, 0.36862745, 0.33333333],
        [0.40392157, 0.37254902, 0.32941176],
        [0.33333333, 0.29803922, 0.24705882],
        ...,
        [0.38431373, 0.31372549, 0.25490196],
        [0.37647059, 0.30980392, 0.24705882],
        [0.38431373, 0.31372549, 0.23921569]],

       [[0.40784314, 0.36078431, 0.3372549 ],
        [0.39607843, 0.36078431, 0.33333333],
        [0.34117647, 0.29803922, 0.2627451 ],
        ...,
        [0.37647059, 0.30980392, 0.25490196],
        [0.36078431, 0.30588235, 0.24705882],
        [0.36470588, 0.30980392, 0.24313725]]])

Model Building (1)¶

In [71]:
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
In [72]:
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)

Let's build a CNN Model.

The model has 2 main parts:

  1. The Feature Extraction layers which are comprised of convolutional and pooling layers.
  2. The Fully Connected classification layers for prediction.

The flow of our model would be as shown below:

  • Our model would start with a sequential Conv2D layer with 64 filters of 3x3 filter with the ReLU activation function. This will take as input an image of size (64x64x3).
  • We will also use padding in order to keep the output shape the same as that of the input shape. Hence, the hyperparameter padding = 'same'.
    This layer would also be followed by a Max Pooling layer.
  • After this, we will have 2 more pairs of Conv2D and Max Pooling layers, having 32 filters with a 3x3 Kernel Size and a pooling size of (2,2).
  • We would flatten out the output from this pooling layer, and use a dense layer over that.
  • This will be a dense layer of 100 neurons (the same as the ANN)
  • We would have an output layer with 12 neurons, as we have 12 output classes in this multi-class classification problem.
In [73]:
# Intializing a sequential model
model = Sequential()

# Adding first conv layer with 64 filters and kernel size 3x3 , padding 'same' provides the output size same as the input size
# Input_shape denotes input image dimension of MNIST images
model.add(Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=(64, 64, 3)))

# Adding max pooling to reduce the size of output of first conv layer
model.add(MaxPooling2D((2, 2), padding = 'same'))

model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))

# flattening the output of the conv layer after max pooling to make it ready for creating dense connections
model.add(Flatten())

# Adding a fully connected dense layer with 100 neurons
model.add(Dense(100, activation='relu'))

# Adding the output layer with 12 neurons and activation functions as softmax since this is a multi-class classification problem
model.add(Dense(12, activation='softmax'))

# Using SGD Optimizer, NOTE: I tried this SGD initially, but Adam optimizer yielded better results
# opt = SGD(learning_rate=0.01, momentum=0.9)

# Using Adam Optimizer
opt = Adam()

# Compile model
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Generating the summary of the model
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                      │ (None, 64, 64, 64)          │           1,792 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 32, 32, 64)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D)                    │ (None, 32, 32, 32)          │          18,464 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 16, 16, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_2 (Conv2D)                    │ (None, 16, 16, 32)          │           9,248 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_2 (MaxPooling2D)       │ (None, 8, 8, 32)            │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten)                    │ (None, 2048)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 100)                 │         204,900 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 12)                  │           1,212 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 235,616 (920.38 KB)
 Trainable params: 235,616 (920.38 KB)
 Non-trainable params: 0 (0.00 B)
In [74]:
history_1 = model.fit(
            X_train, y_train_encoded,
            epochs=15,
            validation_split=0.1,
            shuffle=True,
            batch_size=64,
            verbose=2
)
Epoch 1/15
54/54 - 5s - 91ms/step - accuracy: 0.1444 - loss: 2.4182 - val_accuracy: 0.2605 - val_loss: 2.2661
Epoch 2/15
54/54 - 1s - 11ms/step - accuracy: 0.3275 - loss: 2.0030 - val_accuracy: 0.4368 - val_loss: 1.6214
Epoch 3/15
54/54 - 1s - 10ms/step - accuracy: 0.4427 - loss: 1.5981 - val_accuracy: 0.5158 - val_loss: 1.3068
Epoch 4/15
54/54 - 1s - 10ms/step - accuracy: 0.5129 - loss: 1.3850 - val_accuracy: 0.5342 - val_loss: 1.2501
Epoch 5/15
54/54 - 1s - 12ms/step - accuracy: 0.5865 - loss: 1.2094 - val_accuracy: 0.5947 - val_loss: 1.0731
Epoch 6/15
54/54 - 1s - 10ms/step - accuracy: 0.6365 - loss: 1.0474 - val_accuracy: 0.6342 - val_loss: 0.9915
Epoch 7/15
54/54 - 1s - 12ms/step - accuracy: 0.6754 - loss: 0.9359 - val_accuracy: 0.6658 - val_loss: 0.8916
Epoch 8/15
54/54 - 1s - 10ms/step - accuracy: 0.6968 - loss: 0.8667 - val_accuracy: 0.7105 - val_loss: 0.8351
Epoch 9/15
54/54 - 1s - 12ms/step - accuracy: 0.7354 - loss: 0.7709 - val_accuracy: 0.7158 - val_loss: 0.8119
Epoch 10/15
54/54 - 1s - 12ms/step - accuracy: 0.7535 - loss: 0.7124 - val_accuracy: 0.7237 - val_loss: 0.7920
Epoch 11/15
54/54 - 1s - 11ms/step - accuracy: 0.7725 - loss: 0.6672 - val_accuracy: 0.7368 - val_loss: 0.7389
Epoch 12/15
54/54 - 1s - 12ms/step - accuracy: 0.7830 - loss: 0.6287 - val_accuracy: 0.7211 - val_loss: 0.7939
Epoch 13/15
54/54 - 1s - 21ms/step - accuracy: 0.7962 - loss: 0.5688 - val_accuracy: 0.7421 - val_loss: 0.7892
Epoch 14/15
54/54 - 1s - 12ms/step - accuracy: 0.8099 - loss: 0.5221 - val_accuracy: 0.7447 - val_loss: 0.7944
Epoch 15/15
54/54 - 1s - 12ms/step - accuracy: 0.8307 - loss: 0.4701 - val_accuracy: 0.7211 - val_loss: 0.8585
In [75]:
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
In [76]:
train_accuracy_1 = round(history_1.history['accuracy'][-1], 3)
val_accuracy_1 = round(history_1.history['val_accuracy'][-1], 3)

# Compute loss and accuracy on test data
test_loss_1, test_accuracy_1 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_1 = round(test_loss_1, 3)
test_accuracy_1 = round(test_accuracy_1, 3)

print(f"Test Loss: {test_loss_1}")
print(f"Test Accuracy: {test_accuracy_1}")
30/30 - 1s - 32ms/step - accuracy: 0.7053 - loss: 0.9008
Test Loss: 0.901
Test Accuracy: 0.705
In [77]:
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1)  # Decode one-hot encoded y_test_encoded

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)

class_names = label_encoder.classes_  # Get class names from the label encoder

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
In [80]:
#Save a copy if this will be the final model
final_model = model
X_test_final = X_test
y_test_final = y_test
y_test_encoded_final = y_test_encoded
label_encoder_final = label_encoder

Model Performance Improvement¶

Reducing the Learning Rate:

Hint: Use ReduceLRonPlateau() function that will be used to decrease the learning rate by some factor, if the loss is not decreasing for some time. This may start decreasing the loss at a smaller learning rate. There is a possibility that the loss may still not decrease. This may lead to executing the learning rate reduction again in an attempt to achieve a lower loss.

Data Augmentation¶

Here were will use data augmentaqtion to balance the occurrence of images in the training data for each label. Data augmentation is not used in the validation/test data set.

In [23]:
# Since we only want to perform augmentation on the training data we'll split out train, val, and test separately
X = resized_images
y = labels

# Initial split: 80% train, 20% test
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Second split: from the 80% train_val set, split off 10% for validation
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, test_size=0.125, random_state=42, stratify=y_train_val
)

# Resulting splits:
# X_train, y_train -> 70% of the data (training set)
# X_val, y_val -> 10% of the data (validation set)
# X_test, y_test -> 20% of the data (test set)

# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Validation set shape:", X_val.shape, y_val.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3325, 64, 64, 3) (3325,)
Validation set shape: (475, 64, 64, 3) (475,)
Test set shape: (950, 64, 64, 3) (950,)
In [24]:
#Here we can see the unbalanced categories in the training data
labeled_barplot(y_train, perc=False, n=None)
In [25]:
# For all the labels determine the value of the most frequent one
unique, counts = np.unique(y_train, return_counts=True)

# Find the maximum count value across all classes
target_count = counts.max()

print("Maximum count per class (target count for augmentation):", target_count)
Maximum count per class (target count for augmentation): 458
In [26]:
# Augmentation Parameters
train_datagen = ImageDataGenerator(
    rotation_range=20,         # Random rotations
    width_shift_range=0.1,     # Horizontal shifts
    height_shift_range=0.1,    # Vertical shifts
    shear_range=0.1,           # Shear transformations
    zoom_range=0.1,            # Random zooms
    horizontal_flip=True,      # Random horizontal flips
    vertical_flip=False       # Random vertical flips
)

Let's do a test of the augmentation process on a few sample images and see what the augementation looks like. The display_images function has been enhanced to either plot all the images (show_random=False) or plot samples (show_random=True)

In [27]:
def display_images(image_array, labels, rows=4, cols=6, show_random=True):
    """
    Display a grid of images with their corresponding labels.

    Parameters:
    - image_array: The array containing images (shape: (num_images, height, width, channels)).
    - labels: Array of labels corresponding to the images.
    - rows: Number of rows in the plot grid (default is 4).
    - cols: Number of columns in the plot grid (default is 6).
    - show_random: If True, displays a random subset of images; if False, displays all images.
    """
    # Determine the number of images to show
    num_images = rows * cols if show_random else len(image_array)

    # Adjust rows and columns to fit all images if show_random is False
    if not show_random:
        rows = (num_images // cols) + (num_images % cols > 0)

    # Dynamically set the figure size based on rows and cols
    fig_width = cols * 2.5  # Adjust factor as needed for image size
    fig_height = rows * 2  # Adjust factor as needed for image size
    fig = plt.figure(figsize=(fig_width, fig_height))

    # Adjust subplot spacing to reduce white space
    # plt.subplots_adjust(hspace=0.4, wspace=0.5)

    # Loop through each subplot and display images
    for i in range(num_images):
        # Select a random index if show_random is True, otherwise go sequentially
        index = np.random.randint(0, len(labels)) if show_random else i

        # Add a subplot to the grid
        ax = fig.add_subplot(rows, cols, i + 1)

        # Plot the image
        ax.imshow(image_array[index])

        # Set the title to the corresponding label
        ax.set_title(labels[index])

        # Remove axis ticks for better visualization
        ax.axis('off')

    # Display the plot
    plt.show()
In [ ]:
# Generate a small batch of augmented images
X_subset = X_train[:10]  # Taking a small subset of original images for visualization
y_subset = y_train[:10]  # Corresponding labels

# Create an iterator with a batch size equal to the subset size
augmented_iterator = train_datagen.flow(X_subset, y_subset, batch_size=len(X_subset), shuffle=False)

# Generate one batch of augmented images
X_augmented, y_augmented = next(augmented_iterator)

# Convert X_augmented to uint8
X_augmented = X_augmented.astype(np.uint8)

# Display original images
print("Original Images:")
display_images(X_subset, y_subset, rows=2, cols=5, show_random=False)

# Display augmented images
print("Augmented Images:")
display_images(X_augmented, y_augmented, rows=2, cols=5, show_random=False)
Original Images:
Augmented Images:

Now we need to work on augmenting the images so that we have an equal number plant species in the training data set.

In [12]:
# Function to augment train images for a specific class to achieve a specific target_count
def augment_class(X_class, y_class, target_count):
    augmented_images = []
    augmented_labels = []
    current_count = X_class.shape[0]

    # Continue augmenting until reaching the target count
    while current_count < target_count:
        for X_batch, y_batch in train_datagen.flow(X_class, y_class, batch_size=1):
            augmented_images.append(X_batch[0])  # Append augmented image
            augmented_labels.append(y_batch[0])  # Append label
            current_count += 1
            if current_count >= target_count:
                break

    return np.array(augmented_images), np.array(augmented_labels)
In [29]:
# Loop through each class in X_train and y_train
X_train_balanced = []
y_train_balanced = []

for class_label in np.unique(y_train):
    # Extract images and labels for the current class
    X_class = X_train[y_train == class_label]
    y_class = y_train[y_train == class_label]

    # Add existing images
    X_train_balanced.extend(X_class)
    y_train_balanced.extend(y_class)

    # If the class count is less than the target, augment images
    if len(X_class) < target_count:
        X_aug, y_aug = augment_class(X_class, y_class, target_count)
        X_train_balanced.extend(X_aug)
        y_train_balanced.extend(y_aug)

# Convert lists back to arrays
X_train_balanced = np.array(X_train_balanced)
y_train_balanced = np.array(y_train_balanced)

# Convert X_train_balanced to uint8
X_train_balanced = X_train_balanced.astype(np.uint8)

print("Balanced training set shape:", X_train_balanced.shape, y_train_balanced.shape)
Balanced training set shape: (5496, 64, 64, 3) (5496,)
In [28]:
#Confirm that the classes are balanced now
labeled_barplot(y_train_balanced, perc=False, n=None)
In [30]:
# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels)  # Fit on the full labels array to capture all unique values

# 2. Encode y_train, y_val, and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train_balanced)
y_val_encoded = label_encoder.transform(y_val)
y_test_encoded = label_encoder.transform(y_test)

# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_val_encoded = tf.keras.utils.to_categorical(y_val_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)

# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_val shape:", y_val_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (5496, 12)
One-hot encoded y_val shape: (475, 12)
One-hot encoded y_test shape: (950, 12)
In [31]:
# Normalizing the image pixels for train, val and test
X_train_balanced = X_train_balanced/255
X_val = X_val/255
X_test = X_test/255
In [32]:
#Verify the values are now in the 0 to 1 range
X_train_balanced[0]
Out[32]:
array([[[0.38823529, 0.30980392, 0.24313725],
        [0.38039216, 0.30588235, 0.23137255],
        [0.39607843, 0.33333333, 0.25490196],
        ...,
        [0.35686275, 0.25098039, 0.14509804],
        [0.35294118, 0.24313725, 0.12941176],
        [0.36078431, 0.25882353, 0.14509804]],

       [[0.38823529, 0.30980392, 0.24313725],
        [0.38431373, 0.30588235, 0.23137255],
        [0.39215686, 0.3254902 , 0.24313725],
        ...,
        [0.3254902 , 0.21568627, 0.10980392],
        [0.3372549 , 0.22745098, 0.12156863],
        [0.37254902, 0.2745098 , 0.16862745]],

       [[0.39215686, 0.32156863, 0.25098039],
        [0.39215686, 0.31764706, 0.24313725],
        [0.38823529, 0.31764706, 0.23529412],
        ...,
        [0.34901961, 0.24705882, 0.14509804],
        [0.36078431, 0.25882353, 0.16078431],
        [0.38039216, 0.29803922, 0.19607843]],

       ...,

       [[0.31372549, 0.18823529, 0.14509804],
        [0.37254902, 0.23529412, 0.18039216],
        [0.41568627, 0.2745098 , 0.21176471],
        ...,
        [0.23137255, 0.16862745, 0.14901961],
        [0.22352941, 0.17647059, 0.15294118],
        [0.22352941, 0.18823529, 0.15294118]],

       [[0.38039216, 0.24313725, 0.18039216],
        [0.43137255, 0.29411765, 0.21960784],
        [0.47843137, 0.34509804, 0.26666667],
        ...,
        [0.22352941, 0.16862745, 0.14117647],
        [0.22352941, 0.18823529, 0.15686275],
        [0.23137255, 0.20784314, 0.17254902]],

       [[0.43529412, 0.29411765, 0.21176471],
        [0.48235294, 0.35294118, 0.26666667],
        [0.52156863, 0.40392157, 0.31372549],
        ...,
        [0.22745098, 0.18431373, 0.14901961],
        [0.22352941, 0.19215686, 0.15686275],
        [0.22745098, 0.2       , 0.16078431]]])

Model Building (2)¶

In [33]:
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
In [34]:
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)
In [35]:
# Define the EarlyStopping callback, NOTE: EarlyStopping was tried, but did not yield better results
early_stopping = EarlyStopping(
    monitor='val_loss',    # Metric to monitor
    patience=5,            # Number of epochs with no improvement after which training will be stopped
    restore_best_weights=True  # Restore model weights from the epoch with the best value of the monitored metric
)
In [36]:
# Define the ReduceLROnPlateau callback
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',    # Metric to monitor
    factor=0.5,            # Factor by which the learning rate will be reduced
    patience=3,            # Number of epochs with no improvement after which learning rate will be reduced
    min_lr=1e-6,           # Lower bound on the learning rate
    verbose=1              # Verbose output
)
In [37]:
# Use the same model as before
# Intializing a sequential model
model = Sequential()

# Adding first conv layer with 64 filters and kernel size 3x3 , padding 'same' provides the output size same as the input size
# Input_shape denotes input image dimension of MNIST images
model.add(Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=(64, 64, 3)))

# Adding max pooling to reduce the size of output of first conv layer
model.add(MaxPooling2D((2, 2), padding = 'same'))

model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))

# flattening the output of the conv layer after max pooling to make it ready for creating dense connections
model.add(Flatten())

# Adding a fully connected dense layer with 100 neurons
model.add(Dense(100, activation='relu'))

# Adding the output layer with 12 neurons and activation functions as softmax since this is a multi-class classification problem
model.add(Dense(12, activation='softmax'))

# Using Adam Optimizer
opt = Adam()

# Compile model
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Generating the summary of the model
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                      │ (None, 64, 64, 64)          │           1,792 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 32, 32, 64)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D)                    │ (None, 32, 32, 32)          │          18,464 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 16, 16, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_2 (Conv2D)                    │ (None, 16, 16, 32)          │           9,248 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_2 (MaxPooling2D)       │ (None, 8, 8, 32)            │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten)                    │ (None, 2048)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 100)                 │         204,900 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 12)                  │           1,212 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 235,616 (920.38 KB)
 Trainable params: 235,616 (920.38 KB)
 Non-trainable params: 0 (0.00 B)
In [38]:
history_2 = model.fit(
            X_train_balanced, y_train_encoded,
            epochs=20,
            validation_data=(X_val, y_val_encoded),  # Use the predefined validation data
            shuffle=True,
            batch_size=64,
            #callbacks=[early_stopping],  # Pass the early stopping callback
            callbacks=[reduce_lr],  # Pass the ReduceLROnPlateau callback
            verbose=2
)
Epoch 1/20
86/86 - 6s - 75ms/step - accuracy: 0.1521 - loss: 2.3182 - val_accuracy: 0.2632 - val_loss: 1.9404 - learning_rate: 0.0010
Epoch 2/20
86/86 - 1s - 12ms/step - accuracy: 0.3641 - loss: 1.6932 - val_accuracy: 0.4442 - val_loss: 1.4778 - learning_rate: 0.0010
Epoch 3/20
86/86 - 1s - 15ms/step - accuracy: 0.4778 - loss: 1.4012 - val_accuracy: 0.5137 - val_loss: 1.2708 - learning_rate: 0.0010
Epoch 4/20
86/86 - 1s - 13ms/step - accuracy: 0.5580 - loss: 1.1969 - val_accuracy: 0.5853 - val_loss: 1.0824 - learning_rate: 0.0010
Epoch 5/20
86/86 - 1s - 9ms/step - accuracy: 0.6210 - loss: 1.0198 - val_accuracy: 0.6126 - val_loss: 1.0113 - learning_rate: 0.0010
Epoch 6/20
86/86 - 1s - 16ms/step - accuracy: 0.6718 - loss: 0.9018 - val_accuracy: 0.6358 - val_loss: 0.9816 - learning_rate: 0.0010
Epoch 7/20
86/86 - 1s - 13ms/step - accuracy: 0.7123 - loss: 0.8047 - val_accuracy: 0.6611 - val_loss: 0.9297 - learning_rate: 0.0010
Epoch 8/20
86/86 - 1s - 9ms/step - accuracy: 0.7362 - loss: 0.7374 - val_accuracy: 0.6632 - val_loss: 0.9213 - learning_rate: 0.0010
Epoch 9/20
86/86 - 1s - 16ms/step - accuracy: 0.7411 - loss: 0.7087 - val_accuracy: 0.6884 - val_loss: 0.9028 - learning_rate: 0.0010
Epoch 10/20
86/86 - 1s - 15ms/step - accuracy: 0.7566 - loss: 0.6822 - val_accuracy: 0.6547 - val_loss: 0.9558 - learning_rate: 0.0010
Epoch 11/20
86/86 - 1s - 15ms/step - accuracy: 0.7868 - loss: 0.5882 - val_accuracy: 0.7011 - val_loss: 0.8964 - learning_rate: 0.0010
Epoch 12/20
86/86 - 1s - 12ms/step - accuracy: 0.8090 - loss: 0.5270 - val_accuracy: 0.7305 - val_loss: 0.8562 - learning_rate: 0.0010
Epoch 13/20
86/86 - 1s - 10ms/step - accuracy: 0.8197 - loss: 0.4941 - val_accuracy: 0.6779 - val_loss: 0.8867 - learning_rate: 0.0010
Epoch 14/20
86/86 - 1s - 10ms/step - accuracy: 0.8241 - loss: 0.4807 - val_accuracy: 0.7158 - val_loss: 0.8240 - learning_rate: 0.0010
Epoch 15/20
86/86 - 1s - 10ms/step - accuracy: 0.8466 - loss: 0.4309 - val_accuracy: 0.7368 - val_loss: 0.7957 - learning_rate: 0.0010
Epoch 16/20
86/86 - 1s - 9ms/step - accuracy: 0.8632 - loss: 0.3984 - val_accuracy: 0.7305 - val_loss: 0.8533 - learning_rate: 0.0010
Epoch 17/20
86/86 - 1s - 15ms/step - accuracy: 0.8779 - loss: 0.3486 - val_accuracy: 0.7011 - val_loss: 0.9345 - learning_rate: 0.0010
Epoch 18/20

Epoch 18: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
86/86 - 1s - 14ms/step - accuracy: 0.8937 - loss: 0.3038 - val_accuracy: 0.6968 - val_loss: 0.9667 - learning_rate: 0.0010
Epoch 19/20
86/86 - 1s - 9ms/step - accuracy: 0.9061 - loss: 0.2725 - val_accuracy: 0.7284 - val_loss: 0.9151 - learning_rate: 5.0000e-04
Epoch 20/20
86/86 - 1s - 15ms/step - accuracy: 0.9245 - loss: 0.2286 - val_accuracy: 0.7284 - val_loss: 0.9374 - learning_rate: 5.0000e-04
In [39]:
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
In [40]:
train_accuracy_2 = round(history_2.history['accuracy'][-1], 3)
val_accuracy_2 = round(history_2.history['val_accuracy'][-1], 3)

# Compute loss and accuracy on test data
test_loss_2, test_accuracy_2 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_2 = round(test_loss_2, 3)
test_accuracy_2 = round(test_accuracy_2, 3)

print(f"Test Loss: {test_loss_2}")
print(f"Test Accuracy: {test_accuracy_2}")
30/30 - 1s - 22ms/step - accuracy: 0.7116 - loss: 1.0265
Test Loss: 1.026
Test Accuracy: 0.712
In [41]:
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1)  # Decode one-hot encoded y_test_encoded

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)

class_names = label_encoder.classes_  # Get class names from the label encoder

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step

Model Building (3)¶

As a final exercise, we'll try a transfer learning approach using the InceptionV3 model. InceptionV3 was chosen, because it is tolerant of using smaller image sizes that we have here of 128x128. VGGNet was not used, because it typically expects higher resolution images, namely 224x224, and is not tolerant for smaller image sizes according to descriptions I reviewed. ResNet could have been another option to try.

In [8]:
# Since we only want to perform augmentation on the training data we'll split out train, val, and test separately
# Note, this time we are using the original 128x128x3 size images and not the reduced 64x64x3 ones.
X = rgb_images
y = labels

# Initial split: 80% train, 20% test
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Second split: from the 80% train_val set, split off 10% for validation
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, test_size=0.125, random_state=42, stratify=y_train_val
)

# Resulting splits:
# X_train, y_train -> 70% of the data (training set)
# X_val, y_val -> 10% of the data (validation set)
# X_test, y_test -> 20% of the data (test set)

# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Validation set shape:", X_val.shape, y_val.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3325, 128, 128, 3) (3325,)
Validation set shape: (475, 128, 128, 3) (475,)
Test set shape: (950, 128, 128, 3) (950,)
In [9]:
# For all the labels determine the value of the most frequent one
unique, counts = np.unique(y_train, return_counts=True)

# Find the maximum count value across all classes
target_count = counts.max()

print("Maximum count per class (target count for augmentation):", target_count)
Maximum count per class (target count for augmentation): 458
In [10]:
# Augmentation Parameters
train_datagen = ImageDataGenerator(
    rotation_range=20,         # Random rotations
    width_shift_range=0.1,     # Horizontal shifts
    height_shift_range=0.1,    # Vertical shifts
    shear_range=0.1,           # Shear transformations
    zoom_range=0.1,            # Random zooms
    horizontal_flip=True,      # Random horizontal flips
    vertical_flip=False       # Random vertical flips
)
In [13]:
# Loop through each class in X_train and y_train
X_train_balanced = []
y_train_balanced = []

for class_label in np.unique(y_train):
    # Extract images and labels for the current class
    X_class = X_train[y_train == class_label]
    y_class = y_train[y_train == class_label]

    # Add existing images
    X_train_balanced.extend(X_class)
    y_train_balanced.extend(y_class)

    # If the class count is less than the target, augment images
    if len(X_class) < target_count:
        X_aug, y_aug = augment_class(X_class, y_class, target_count)
        X_train_balanced.extend(X_aug)
        y_train_balanced.extend(y_aug)

# Convert lists back to arrays
X_train_balanced = np.array(X_train_balanced)
y_train_balanced = np.array(y_train_balanced)

# Convert X_train_balanced to uint8
X_train_balanced = X_train_balanced.astype(np.uint8)

print("Balanced training set shape:", X_train_balanced.shape, y_train_balanced.shape)
Balanced training set shape: (5496, 128, 128, 3) (5496,)
In [14]:
# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels)  # Fit on the full labels array to capture all unique values

# 2. Encode y_train, y_val, and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train_balanced)
y_val_encoded = label_encoder.transform(y_val)
y_test_encoded = label_encoder.transform(y_test)

# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_val_encoded = tf.keras.utils.to_categorical(y_val_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)

# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_val shape:", y_val_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (5496, 12)
One-hot encoded y_val shape: (475, 12)
One-hot encoded y_test shape: (950, 12)
In [15]:
# Normalizing the image pixels for val and test
X_train_balanced = X_train_balanced/255
X_val = X_val/255
X_test = X_test/255
In [16]:
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
In [17]:
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)
In [18]:
inception_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(128, 128, 3))

# Freeze layers
for layer in inception_model.layers:
    layer.trainable = False

#inception_model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
87910968/87910968 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
In [19]:
# Add custom layers on top of InceptionV3 with Dropout layers
x = inception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x)  # Adding a dropout layer with a 20% dropout rate
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x)  # Adding another dropout layer with a 20% dropout rate
output = Dense(12, activation='softmax')(x)

# Create the new model
model = Model(inputs=inception_model.input, outputs=output)

# As this is a very complex model, let's not show the summary as usual.  Instead below
# We'll extract out some key information
#model.summary()
In [21]:
# Get the total number of layers
total_layers = len(model.layers)
print(f"Total number of layers: {total_layers}")

# Calculate total, trainable, and non-trainable parameters
trainable_params = int(np.sum([np.prod(v.shape) for v in model.trainable_weights]))
non_trainable_params = int(np.sum([np.prod(v.shape) for v in model.non_trainable_weights]))
total_params = trainable_params + non_trainable_params

print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")
print(f"Non-trainable parameters: {non_trainable_params}")
Total number of layers: 317
Total parameters: 22074092
Trainable parameters: 271308
Non-trainable parameters: 21802784
In [52]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
In [53]:
history_3 = model.fit(
            X_train_balanced, y_train_encoded,
            epochs=20,
            validation_data=(X_val, y_val_encoded),  # Use the predefined validation data
            callbacks=[reduce_lr],  # Pass the ReduceLROnPlateau callback
            verbose=2
)
Epoch 1/20
172/172 - 54s - 315ms/step - accuracy: 0.2615 - loss: 2.1925 - val_accuracy: 0.4189 - val_loss: 1.6930 - learning_rate: 0.0010
Epoch 2/20
172/172 - 40s - 232ms/step - accuracy: 0.4323 - loss: 1.6453 - val_accuracy: 0.4989 - val_loss: 1.4218 - learning_rate: 0.0010
Epoch 3/20
172/172 - 5s - 28ms/step - accuracy: 0.5111 - loss: 1.3926 - val_accuracy: 0.5200 - val_loss: 1.3192 - learning_rate: 0.0010
Epoch 4/20
172/172 - 5s - 30ms/step - accuracy: 0.5620 - loss: 1.2505 - val_accuracy: 0.5726 - val_loss: 1.2391 - learning_rate: 0.0010
Epoch 5/20
172/172 - 5s - 28ms/step - accuracy: 0.6064 - loss: 1.1252 - val_accuracy: 0.5853 - val_loss: 1.2743 - learning_rate: 0.0010
Epoch 6/20
172/172 - 5s - 30ms/step - accuracy: 0.6235 - loss: 1.0674 - val_accuracy: 0.5705 - val_loss: 1.2635 - learning_rate: 0.0010
Epoch 7/20
172/172 - 5s - 27ms/step - accuracy: 0.6596 - loss: 0.9604 - val_accuracy: 0.6126 - val_loss: 1.2103 - learning_rate: 0.0010
Epoch 8/20
172/172 - 5s - 27ms/step - accuracy: 0.6814 - loss: 0.9096 - val_accuracy: 0.5895 - val_loss: 1.2312 - learning_rate: 0.0010
Epoch 9/20
172/172 - 5s - 27ms/step - accuracy: 0.7054 - loss: 0.8304 - val_accuracy: 0.6021 - val_loss: 1.2061 - learning_rate: 0.0010
Epoch 10/20
172/172 - 4s - 26ms/step - accuracy: 0.7180 - loss: 0.7909 - val_accuracy: 0.6189 - val_loss: 1.2040 - learning_rate: 0.0010
Epoch 11/20
172/172 - 5s - 29ms/step - accuracy: 0.7371 - loss: 0.7474 - val_accuracy: 0.6042 - val_loss: 1.3093 - learning_rate: 0.0010
Epoch 12/20
172/172 - 5s - 26ms/step - accuracy: 0.7455 - loss: 0.7295 - val_accuracy: 0.6105 - val_loss: 1.2678 - learning_rate: 0.0010
Epoch 13/20

Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
172/172 - 5s - 27ms/step - accuracy: 0.7647 - loss: 0.6624 - val_accuracy: 0.5874 - val_loss: 1.3211 - learning_rate: 0.0010
Epoch 14/20
172/172 - 5s - 28ms/step - accuracy: 0.7913 - loss: 0.5789 - val_accuracy: 0.6147 - val_loss: 1.2973 - learning_rate: 5.0000e-04
Epoch 15/20
172/172 - 5s - 29ms/step - accuracy: 0.8011 - loss: 0.5551 - val_accuracy: 0.6232 - val_loss: 1.3077 - learning_rate: 5.0000e-04
Epoch 16/20

Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
172/172 - 5s - 27ms/step - accuracy: 0.8090 - loss: 0.5174 - val_accuracy: 0.6295 - val_loss: 1.3188 - learning_rate: 5.0000e-04
Epoch 17/20
172/172 - 4s - 26ms/step - accuracy: 0.8397 - loss: 0.4675 - val_accuracy: 0.6316 - val_loss: 1.2922 - learning_rate: 2.5000e-04
Epoch 18/20
172/172 - 5s - 29ms/step - accuracy: 0.8372 - loss: 0.4531 - val_accuracy: 0.6337 - val_loss: 1.2570 - learning_rate: 2.5000e-04
Epoch 19/20

Epoch 19: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
172/172 - 6s - 33ms/step - accuracy: 0.8413 - loss: 0.4490 - val_accuracy: 0.6253 - val_loss: 1.3118 - learning_rate: 2.5000e-04
Epoch 20/20
172/172 - 5s - 27ms/step - accuracy: 0.8493 - loss: 0.4253 - val_accuracy: 0.6421 - val_loss: 1.3065 - learning_rate: 1.2500e-04
In [54]:
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
In [55]:
train_accuracy_3 = round(history_3.history['accuracy'][-1], 3)
val_accuracy_3 = round(history_3.history['val_accuracy'][-1], 3)

# Compute loss and accuracy on test data
test_loss_3, test_accuracy_3 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_3 = round(test_loss_3, 3)
test_accuracy_3 = round(test_accuracy_3, 3)

print(f"Test Loss: {test_loss_3}")
print(f"Test Accuracy: {test_accuracy_3}")
30/30 - 7s - 226ms/step - accuracy: 0.6758 - loss: 1.2048
Test Loss: 1.205
Test Accuracy: 0.676
In [56]:
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1)  # Decode one-hot encoded y_test_encoded

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)

class_names = label_encoder.classes_  # Get class names from the label encoder

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 12s 199ms/step

Final Model¶

The Base CNN model has the best accuracy and least variation between train and test accuracy (suggesting the model suffers the least from overfitting). The Base CNN model is the clearly the best of the three and is selected as the Final Model.

In [57]:
train_param_1 = "235,616"
train_param_2 = "235,616"
train_param_3 = "271,308"

# Create the Markdown table as a formatted string
table_md = f"""
| Model / Acc | Train | Val | Test | Trainable Params |
|-------------|-------|-----|------|------------------|
| Base CNN | {train_accuracy_1} | {val_accuracy_1} | {test_accuracy_1} | {train_param_1} |
| Base CNN with Aug | {train_accuracy_2} | {val_accuracy_2} | {test_accuracy_2} | {train_param_2} |
| InceptionV3 with Aug | {train_accuracy_3} | {val_accuracy_3} | {test_accuracy_3} | {train_param_3} |
"""

# Display the table
from IPython.display import Markdown, display
display(Markdown(table_md))
Model / Acc Train Val Test Trainable Params
Base CNN 0.837 0.755 0.729 235,616
Base CNN with Aug 0.924 0.728 0.712 235,616
InceptionV3 with Aug 0.849 0.642 0.676 271,308

Visualizing the prediction¶

Now let's take a few sample images from the test data and run them through the final model to check its predictions.

In [90]:
# Select a few test samples
num_samples = 5  # Number of test images to check
sample_indices = np.random.choice(len(X_test_final), num_samples, replace=False)
sample_images = X_test_final[sample_indices]
sample_labels = y_test_final[sample_indices]  # Actual labels
sample_labels_encoded = y_test_encoded_final[sample_indices]  # Encoded labels for prediction
In [93]:
# Run the samples through the model to get predictions
predictions = final_model.predict(sample_images)
predicted_labels = np.argmax(predictions, axis=1)  # Decode one-hot predictions
print(sample_labels)
print(predicted_labels)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 63ms/step
['Scentless Mayweed' 'Common wheat' 'Maize' 'Loose Silky-bent' 'Cleavers']
[ 8  4 11  6  2]
In [94]:
# Create label mapping using label_encoder_final
label_mapping = {index: label for index, label in enumerate(label_encoder_final.classes_)}
In [95]:
for i in range(num_samples):
    # Convert one-hot encoded labels to class indices
    true_label_index = np.argmax(sample_labels_encoded[i])
    predicted_label_index = predicted_labels[i]  # Directly use the predicted label index

    # Map indices to original string labels
    true_label = label_mapping[true_label_index]
    predicted_label = label_mapping[predicted_label_index]

    # Display the image with true and predicted labels
    plt.imshow(sample_images[i])
    plt.title(f"True: {true_label}, Predicted: {predicted_label}")
    plt.axis('off')
    plt.show()

In the above, Maize was incorrectly predicted as Sugar Beet, which per the Confusion Matrix was one of the mistakes it sometimes makes in the case of Maize.

Actionable Insights and Business Recommendations¶

  • Our final model, the base CNN model, achieved a test accuracy of 73%.
  • Attempts to improve the accuracy using augmention to address the imbalance and trying out a pre-built model with transfer learning did not improve the results.
  • Given the almost limitless variations that could be tried with CNN's it might be possible to improve the accuracy